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ABSTRACT 


ALVINN  (Autonomous  Land  Vehicle  In  a  Neural  Network)  is  a  3-layer 
back-propagation  network  designed  for  the  task  of  road  following. 
Currently  ALVINN  takes  images  from  a  camera  and  a  laser  range  finder 
as  input  and  produces  as  output  the  direction  the  vehicle  should 
travel  in  order  to  follow  the  road.  Training  has  been  conducted 
using  simulated  road  images.  Successful  tests  on  the  Carnegie 
Mellon  autonomous  navigation  test  vehicle  indicate  that  the  network 
can  effectively  follow  real  roads  under  certain  field  conditions. 

The  representation  developed  to  perform  the  task  differs  dramatically 
when  the  network  is  trained  under  various  conditions,  suggesting  the 
possibility  of  a  novel  adaptive  autonomous  navigation  system  capable 
of  tailoring  its  processing  to  the  conditions  at  hand.  ^  <.v  ?  •  ■  < 


INTRODUCTION 


Autonomous  navigation  has  been  a  difficult  problem  for  traditional  vision  and  robotic 
techniques,  primarily  because  of  the  noise  and  variability  associated  with  real  world 
scenes.  Autonomous  navigation  systems  based  on  traditional  image  processing  and  pat¬ 
tern  recognition  techniques  often  perform  well  under  certain  conditions  but  have  problems 
with  others.  Part  of  the  difficulty  stems  from  the  fact  that  the  processing  performed  by 
these  systems  remains  fixed  across  various  driving  situations. 

Artificial  neural  networks  have  displayed  promising  performance  and  flexibility  in  other 
domains  characterized  by  high  degrees  of  noise  and  variability,  such  as  handwritten 
character  recognition  (Jackel  et  al.,  1988]  [Pawlicki  et  aL,  1988]  and  speech  recognition 
fWaibel  et  al.,  1988].  ALVINN  (Autonomous  Land  Vehicle  In  a  Neural  Network)  is  a 
connectionist  approach  to  the  navigational  task  of  road  following.  Specifically,  ALVINN 
is  an  artificial  neural  network  designed  to  control  the  NAVLAB,  the  Carnegie  Mellon 
autonomous  navigation  test  vehicle. 


NETWORK  ARCHITECTURE 

ALVINN ’s  current  architecture  consists  of  a  single  hidden  layer  back-propagation  network 
(See  Figure  1).  The  input  layer  is  divided  into  three  sets  of  units:  two  “retinas”  and  a 
single  intensity  feedback  unit  The  two  retinas  correspond  to  the  two  forms  of  sensory 
input  available  on  the  NAVLAB  vehicle;  video  and  range  information.  The  first  retina, 
consisting  of  30x32  units,  receives  video  camera  input  from  a  road  scene.  The  activation 
level  of  each  unit  in  this  retina  is  proportional  to  the  intensity  in  the  blue  color  band  of 
the  corresponding  patch  of  the  image.  The  blue  band  of  the  color  image  is  used  because 
it  provides  the  highest  contrast  between  the  road  and  the  non-road.  The  second  retina, 
consisting  of  8x32  units,  receives  input  from  a  laser  range  finder.  The  activation  level  of 
each  unit  in  this  retina  is  proportional  to  the  proximity  of  the  corresponding  area  in  the 
image.  The  road  intensity  feedback  unit  indicates  whether  the  road  is  fighter  or  darker 
than  the  non-road  in  the  previous  image.  Each  of  these  1217  input  units  is  fully  connected 
to  the  hidden  layer  of  29  units,  which  is  in  turn  fully  connected  to  the  output  layer. 


The  output  layer  consists  of  46  units,  divided  into  two  groups.  The  first  set  of  45  units 
is  a  linear  representation  of  the  turn  curvature  along  which  the  vehicle  should  travel  in 
order  to  bead  towards  the  road  center.  The  middle  unit  represents  the  “travel  straight 
ahead”  condition  while  units  to  the  left  and  right  of  the  center  represent  successively 
sharper  left  and  right  turns.  The  network  is  trained  with  a  desired  output  vector  of  all 
zeros  except  for  a  “hill”  of  activation  centered  on  the  unit  representing  the  correct  turn 
curvature,  which  is  the  curvature  which  would  bring  the  vehicle  to  the  road  center  7 
meters  ahead  of  its  current  position.  More  specifically,  the  desired  activation  levels  for 
the  nine  units  centered  around  the  correct  turn  curvature  unit  are  0.10,  0.32,  0.61,  0.89, 
1.00, 0.89, 0.61, 032  and  0.10.  During  testing,  the  turn  curvature  dictated  by  the  network 
is  taken  to  be  the  curvature  represented  by  the  output  unit  with  the  highest  activation 
leveL 

The  final  output  unit  is  a  road  intensity  feedback  unit  which  indicates  whether  the  road 
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Figure  1:  ALVINN  Architecture 


is  lighter  or  darker  than  the  non-road  in  die  current  image.  During  testing,  the  activation 
of  the  output  road  intensity  feedback  unit  is  recirculated  to  the  input  layer  in  the  style 
of  Jordan  [Jordan,  1988]  to  aid  the  network's  processing  by  providing  rodimentary  in¬ 
formation  concerning  the  relative  intensities  of  the  road  and  the  non-toad  in  the  previous 
image. 


TRAINING 

Training  on  actual  road  images  is  logistically  difficult,  because  in  order  to  develop  a 
general  representation,  the  network  must  be  presented  with  a  large  number  of  training 
exemplars  depicting  roads  under  a  wide  variety  of  conditions.  Collection  of  such  a 
data  set  would  be  difficult,  and  changes  in  parameters  such  as  camera  orientation  would 
require  collecting  an  entirely  new  set  of  road  images.  To  avoid  these  difficulties  we  have 
developed  a  simulated  road  generator  which  creates  road  images  to  be  used  as  training 
exemplars  for  the  network.  The  simulated  road  generator  uses  nearly  200  parameters 
in  order  to  generate  a  variety  of  realistic  road  images.  Some  of  the  most  important 
parameters  are  listed  in  figure  2. 

Figure  3  depicts  the  video  images  of  one  real  road  and  one  artificial  road  generated 
with  a  single  set  of  values  for  the  parameters  from  Figure  2.  Although  not  shown  in 
Figure  3,  the  road  generator  also  creates  corresponding  simulated  range  finder  images. 
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•  size  of  video  camera  retina 

•  3D  position  of  video  camera 

•  direction  of  video  camera 

•  field  of  view  of  video  camera 

•  Size  of  range  finder  retina 

•  3D  position  of  range  finder  camera 

•  direction  of  range  finder 

•  field  of  view  of  range  finder 

•  position  of  vehicle  relative  to  road  center 

•  road  direction 

•  road  curvature 

•  rate  of  road  curvature  change 

•  road  curve  length 

•  road  width 

•  rate  of  road  width  change 

•  road  intensity 

•  left  non-road  intensity 

•  right  non-road  intensity 

•  road  intensity  variability 

•  non-road  intensity  variability 

•  rate  of  road  intensity  change 

•  rate  of  non-road  intensity  change 

•  position  of  image  saturation  spots 

•  size  of  image  saturation  spots 

•  stupe  of  image  saturation  spots 

•  position  of  obstacles 

•  size  of  obstacles 

•  stupe  of  obstacles 

•  intensity  of  obstacles 

•  shadow  size 

•  shadow  direction 

•  shadow  intensity 

Figure  2:  Parameters  for  simulated  road  generator 
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Real  Road  Image  Simulated  Road  Image 

Figure  3 :  Real  and  simulated  road  images 


At  the  relatively  low  resolution  being  used  it  is  difficult  to  distinguish  between  real  and 
simulated  roads. 

N'etworic  training  is  performed  using  artificial  road  “snapshots"  from  the  simulated  road 
generator  and  the  Warp  back-propagation  simulator  described  in  [Pomerleau  et  al.,  1988]. 
Training  involves  first  creating  a  set  of  1200  different  road  snapshots  by  randomly  varying 
the  parameters  used  by  the  simulated  road  generator.  Back-propagation  is  then  performed 
using  this  set  of  exemplars  until  only  asymptotic  performance  improvements  appear  likely. 
During  the  early  stages  of  training,  the  input  road  intensity  unit  is  given  a  random 
activation  leveL  This  is  done  to  prevent  the  network  from  merely  learning  to  copy  the 
activation  level  of  the  input  road  intensity  unit  to  the  output  road  intensity  unit,  since  their 
activation  levels  should  almost  always  be  identical  because  the  relative  intensity  of  the 
road  and  the  non-road  does  not  often  change  between  two  successive  images.  Once  the 
network  has  developed  a  representation  that  uses  image  characteristics  to  determine  the 
activMioo  level  for  the  output  road  intensity  unit,  the  network  is  given  as  input  whether 
the  road  would  have  been  darker  or  lighter  than  the  non-road  in  the  previous  image.  Using 
this  extra  information  concerning  the  relative  brightness  of  the  road  and  the  non-road, 
the  network  is  better  able  to  determine  the  correct  direction  for  the  vehicle  to  travel. 

PERFORMANCE 

Three  methods  are  used  to  evaluate  ALVTNN’s  performance.  The  first  test  uses  novel 
artificial  road  images.  After  40  epochs  of  training  on  the  1200  simulated  road  snapshots, 
the  network  correctly  dictates  a  aim  curvaatre  within  two  units  of  the  correct  answer 
approximately  90%  of  the  time  on  novel  artificial  road  snapshots.  The  second,  more 
informative  test  involves  “driving"  down  a  simulated  stretch  of  road.  Specifically,  the 
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Figure  4.  NAVLAB,  the  CMU  autonomous  navigation  test  vehicle. 


artificial  road  generator  has  an  interactive  mode  in  which  the  ro«d  image  scrolls  in 
response  to  an  externally  specified  speed  and  direction  of  travel.  After  the  training 
described  above,  ALVTNN  can  drive  the  artificial  road  generator  at  a  constant  speed  on 
trips  of  several  miles  without  straying  from  the  simulated  road.  The  primary  testing  of 
ALVINN’s  performance  is  conducted  on  the  NAVLAB  (See  Figure  4).  The  NAVLAB 
is  a  modified  Chevy  van  equipped  with  3  Sun  computers,  a  Warp,  a  video  camera,  and 
a  laser  range  finder,  which  serves  as  a  testbed  for  the  CMU  autonomous  land  vehicle 
project  [Thorpe  et  al„  1987].  Performance  of  the  network  to  date  is  comparable  to  that 
achieved  by  the  best  traditional  vision-based  autonomous  navigation  algorithm  at  CMU 
under  the  limited  conditions  tested.  Specifically,  the  netwoik  can  accurately  drive  the 
NAVLAB  at  a  speed  of  1/2  meter  per  second  along  a  400  meter  path  through  a  wooded 
area  of  the  CMU  campus  under  sunny  fall  conditions.  Under  similar  conditions  on  the 
same  course,  the  ALV  group  at  CMU  has  recently  achieved  similar  driving  accuracy  at 
a  speed  of  one  meter  per  second  by  implementing  their  image  processing  autonomous 
navigation  algorithm  on  the  Warp  computer.  In  contrast,  the  ALVTNN  netwoik  is  currently 
simulated  using  only  an  on-board  Sun  computer,  and  dramatic  speed  ups  are  expected 
when  tests  are  performed  using  the  Warp. 

NETWORK  REPRESENTATION 

The  representation  developed  by  the  netwoik  to  perform  the  road  following  task  depends 
dramatically  on  the  characteristics  of  the  training  set  When  trained  on  examples  of  roads 
with  a  fixed  width,  the  network  develops  a  representation  in  which  hidden  units  act  as 
filters  for  roads  at  different  positions.  Figures  5.  6  and  7  are  diagrams  of  the  weights 
projecting  to  and  from  single  hidden  units  in  such  a  network. 


Weight  to  Output 
Feedback  Unit 


Weights  to  Direction  Output  Units 


Weights  from 
Range  Finder  Retina 


Figure  5:  Diagram  of  weights  projecting  to  and  from  a  typical  hidden  unit  in  a  network 
trained  on  roads  with  a  fixed  width.  This  hidden  unit  acts  as  a  filter  for  a  single  road  on 
the  left  side  of  the  visual  field  as  illustrated  by  the  schematic. 
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Weight  to  Output 
Feedback  Unit 


Weights  to  Direction  Output  Units 


Weights  from 
Range  Finder  Retina 


Figure  6:  Diagram  of  weights  projecting  to  and  from  a  typical  hidden  unit  in  a  network 
trained  on  roads  with  a  fixed  width.  This  hidden  unit  aas  as  a  filter  for  two  roads,  one 
slightly  left  and  one  slightly  right  of  center,  as  illustrated  by  the  schematic. 
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Weight  tc  Output 
Feedback  Unit 


Weights  tc  Direction  Output  Units 


Weight  from  Input 
Feedback  Unit 


Weight  from 
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Weights  from  Video  Camera  Retina 
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Figure  7:  Diagram  of  weights  projecting  *o  and  from  a  typical  hidden  unit  in  a  network 
trained  on  roads  with  a  fixed  width.  This  hidden  unit  acts  as  a  filter  for  three  roads,  as 
illustrated  by  the  tnmodal  excitatory  connections  to  the  direction  output  units. 
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As  indicated  by  the  weights  to  and  from  the  feedback  unit  in  Figure  5.  this  hidden  unit 
expects  the  road  to  be  lighter  than  the  non-road  in  the  previous  image  and  supports  the 
road  being  lighter  than  the  non-road  in  the  current  image.  More  specifically,  the  weights 
from  the  video  camera  retina  support  the  interpretation  that  this  hidden  unit  is  a  filter  for 
a  single  light  road  on  left  side  of  the  visual  field  (See  the  small  schematic  to  the  left  of 
the  weights  from  the  video  retina  in  Figure  5).  This  interpretation  is  also  supported  by 
the  weights  from  the  range  finder  retina.  This  hidden  unit  is  excited  if  there  is  high  range 
activity  (i.e.  obstacles)  on  the  right  and  inhibited  if  there  is  high  range  activity  on  the  left 
of  the  visual  field  where  this  hidden  unit  expects  the  road  to  be.  Finally,  the  single  road 
filter  interpretation  is  reflected  in  the  weights  from  this  hidden  unit  to  the  direction  output 
units.  Specifically,  this  hidden  unit  makes  excitatory  connections  to  the  output  units  on 
the  far  left,  dictating  a  sharp  left  tum  to  bring  the  vehicle  back  to  the  road  center. 

Figure  6  illustrates  the  weights  to  and  from  a  hidden  unit  with  a  more  complex  represen¬ 
tation.  This  hidden  unit  acts  as  a  filter  for  two  roads,  one  slightly  left  and  one  slightly 
right  of  center.  The  weights  from  the  video  camera  retina  along  with  the  explanatory 
schematic  in  Figure  6  show  the  positions  and  orientations  of  the  two  roads.  This  hid¬ 
den  unit  makes  bimodal  excitatory  connections  to  the  direction  output  units,  dictating  a 
slight  left  or  slight  right  tum.  Finally,  Figure  7  illustrates  a  still  more  complex  hidden 
unit  representation.  Although  it  is  difficult  to  determine  the  nature  of  the  representation 
from  the  video  camera  weights,  it  is  clear  from  the  weights  to  the  direction  output  units 
that  this  hidden  unit  is  a  filter  for  three  different  roads,  each  dictating  a  different  travel 
direction.  Hidden  units  which  act  as  filters  for  one  to  three  roads  are  the  representation 
structures  most  commonly  developed  when  the  network  is  trained  on  roads  with  a  fixed 
width. 

The  network  develops  a  very  different  representation  when  trained  an  snapshots  with 
widely  varying  road  widths.  A  typical  hidden  unit  from  this  type  of  representation  is 
depicted  in  figure  8.  One  important  feature  to  notice  from  the  feedback  weights  is  that 
this  unit  is  filtering  for  a  road  which  is  darker  than  the  non-road.  More  importantly,  it 
is  evident  from  the  video  camera  retina  weights  that  this  hidden  unit  is  a  filter  solely 
for  the  left  edge  of  the  road  (See  schematic  to  the  left  of  the  weights  from  the  video 
retina  in  Figure  8).  This  hidden  unit  supports  a  nuber  wide  range  of  travel  directions. 
This  is  to  be  expected,  since  the  correct  travel  direction  for  a  road  with  an  edge  at  a 
particular  location  varies  substantially  depending  on  the  road’s  width.  This  hidden  unit 
would  cooperate  with  hidden  units  that  detea  the  right  road  edge  to  determine  the  correct 
travel  direction  in  any  particular  situation. 

DISCUSSION  AND  EXTENSIONS 

The  distinct  representations  developed  for  different  circumstances  illustrate  a  key  advan¬ 
tage  provided  by  neural  networks  for  autonomous  navigation.  Namely,  in  this  paradigm 
the  data,  not  the  programmer,  determines  the  salient  image  features  crodal  to  accurate 
road  navigation.  From  a  practical  standpoint  this  data  responsiveness  has  dramatically 
sped  ALVTNN’s  development  Once  a  realistic  artificial  road  generator  was  developed, 
back-propagation  producted  in  half  an  hour  a  relatively  successful  road  following  system. 
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Figure  8:  Diagram  of  weights  projecting  to  and  from  a  typical  hidden  unit  in  a  network 
trained  on  roads  with  different  widths. 
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It  took  many  months  of  algorithm  development  and  parameter  tuning  by  the  vision  and 
autonomous  navigation  groups  at  CMU  to  reach  a  similar  level  of  performance  using 
traditional  image  processing  and  pattern  recognition  techniques. 

More  speculatively,  the  flexibility  of  neural  network  representations  provides  the  pos¬ 
sibility  of  a  very  different  type  of  autonomous  navigation  system  in  which  the  salient 
sensory  features  are  determined  for  specific  driving  conditions.  By  interactively  training 
the  network  on  real  road  images  taken  as  a  human  drives  the  NAVLAB,  we  hope  to 
develop  a  system  that  adapts  its  processing  to  accommodate  current  circumstances.  This 
is  in  contrast  with  other  autonomous  navigation  systems  at  CMU  [Thorpe  et  al.,  1987] 
and  elsewhere  [Dunlay  &  Seida,  1988]  [Dickmanns  &  Zapp,  1986]  [Kuan  et  at.  1988]. 
Each  of  these  implementations  has  relied  on  a  fixed,  highly  structured  and  therefore  rela¬ 
tively  inflexible  algorithm  for  finding  and  following  the  road,  regardless  of  the  conditions 
at  hand. 

There  are  difficulties  involved  with  training  “oo-the-fly”  with  real  images.  If  the  network 
is  not  presented  with  sufficient  variability  in  its  training  exemplars  to  cover  the  conditions 
it  is  likely  to  encounter  when  it  takes  over  driving  from  the  human  operator,  it  will  not 
develop  a  sufficiently  robust  representation  and  will  perform  poorly.  In  addition,  the 
network  must  not  solely  be  shown  examples  of  accurate  driving,  but  also  how  to  recover 
(i.e.  return  to  the  road  center)  once  a  mistake  has  been  made.  Parcel  initial  training  on 
a  variety  of  simulated  road  images  should  help  eliminate  these  difficulties  and  facilitate 
better  performance. 

Another  important  advantage  gained  through  the  use  of  neural  networks  for  autonomous 
navigation  is  the  ease  with  which  they  assimilate  data  from  independent  sensors.  The 
current  ALVINN  implementation  processes  data  from  two  sources,  the  video  camera  and 
the  laser  range  finder.  During  training,  the  network  discovers  how  information  from 
each  source  relates  to  the  task,  md  weights  each  accordingly.  As  an  example,  range 
data  is  in  seme  «n.«e  less  important  for  the  task  of  road  following  than  is  the  video 
data.  The  range  data  contains  information  concerning  the  position  of  obstacles  in  the 
scene,  but  nothing  explicit  about  the  location  of  the  road.  As  a  result,  the  range  data 
is  given  less  significance  in  the  representation,  as  is  illustrated  by  the  relatively  small 
magnitude  weights  from  the  range  finder  retina  in  the  weight  diagrams.  Figures  5,  6  and 
8  illustrate  that  the  range  finder  connections  do  correlate  with  the  connections  from  the 
video  camera,  and  do  contribute  to  choosing  the  correct  travel  direction.  Specifically,  in 
these  three  figures,  obstacles  located  outside  the  area  in  which  the  hidden  unit  erpects 
the  road  to  be  located  increase  the  hidden  unit's  activation  level  while  obstacles  located 
within  the  expected  road  boundaries  inhibit  the  hidden  unit  However  the  contributions 
from  the  range  finger  connections  aren’t  necessary  far  reasonable  performance.  When 
ALVINN  was  tested  with  normal  video  input  but  an  obstacle- free  range  finder  image  as 
constant  input  there  was  no  noticeable  degradation  in  driving  performance.  Obviously 
under  off-road  driving  conditions  obstacle  avoidance  would  become  much  more  important 
and  benoe  one  would  expect  the  range  finder  retina  to  play  a  much  more  significant  role 
in  the  network’s  representation.  We  are  currently  working  on  an  off-road  version  of 
ALVINN  to  test  this  hypothesis. 
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Other  current  directions  for  this  project  include  conducting  more  extensive  tests  of  the 
network's  performance  under  a  variety  of  weather  and  lighting  conditions.  These  will 
be  crucial  for  making  legitimate  performance  comparisons  between  ALVINN  and  other 
autonomous  navigation  techniques.  We  are  also  working  to  increase  driving  speed  by 
implementing  the  network  simulation  on  the  on-board  Warp  computer. 

Additional  extensions  involve  exploring  different  network  architectures  for  the  road  fol¬ 
lowing  task.  These  include  1)  giving  the  network  additional  feedback  information  by  us¬ 
ing  Elman's  [Elman.  1988]  technique  of  recirculating  hidden  activation  levels.  2)  adding 
a  second  hidden  layer  to  facilitate  better  internal  representations,  and  3)  adding  local 
connectivity  to  give  the  network  a  priori  knowledge  of  the  two  dimensional  nature  of  the 
input. 

In  the  area  of  planning,  interesting  extensions  include  stopping  for.  or  planning  a  path 
around,  obstacles.  One  area  of  planning  that  clearly  needs  work  is  dealing  sensibly  with 
road  forks  and  intersections.  Currently  upon  reaching  a  fork,  the  network  may  output  two 
widely  discrepant  travel  directions,  one  for  each  choice.  The  result  is  often  an  oscillation 
in  the  dictated  travel  direction  and  hence  inaccurate  road  following.  Beyond  dealing  with 
individual  intersections,  we  would  eventually  like  to  integrate  a  map  into  the  system  to 
enable  global  point-to-point  path  planning. 

CONCLUSION 

More  extensive  testing  must  be  performed  before  definitive  conclusions  can  be  drawn  con¬ 
cerning  the  performance  of  ALVINN  versus  other  road  followers.  We  are  optimistic  con¬ 
cerning  the  eventual  contributions  neural  networks  will  make  to  the  area  of  autonomous 
navigation.  But  perhaps  just  as  interesting  are  the  possibilities  of  contributions  in  the 
other  direction.  We  hope  that  exploring  autonomous  navigation,  and  in  particular  some  of 
the  extensions  outlined  in  this  paper,  will  have  a  significant  impact  on  the  field  of  neural 
networks.  We  certainly  believe  it  is  important  to  begin  researching  and  evaluating  neural 
networks  in  real  world  situations,  and  we  think  autonomous  navigation  is  an  interesting 
application  for  such  an  approach. 
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